We have built a simulation environment for haplodiploid simulations. As a proof-of-concept, we compare simple simulations of haplodiploid populations with simulations of didplodiploid populations.
In each simulation, we create a population with 1000 males and 1000 females. Each individual has a genome with 1,000,000 loci. At each generation, each individual produces gametes (with a mutation rate of \(1\times10^{-8}\) and recombination of \(\times10^{-5}\)) and mates with another individual of the opposite sex. The alleles produced by mutation have a given selective coefficient, which we change depending on the simulation condition. The inheritance of the genome onto the next generation is based on the sum of the selective coefficients of the alleles carried by each individual. In the haplodiploid simulation, males are haploid, produced by the non-fertilised gametes of diploid females. In diploid simulations, both males and females are diploid, and there is no sex chromosome.
We ran the simulations where we varied the selection coefficient of new mutations. We ran the following simulations:
Each simulation was ran for 40,000 generations after a run-in of 10,000 generations. For each simulation, we recorded the number of novel alleles that have become fixed (cumulatively) every 100th generation.
Haplodiploid populations have a lower effective population size than diploid populations. This means that there will be fewer neutral mutations emerging in a haplodiploid population than in a diploid population with the same number of individuals, but that each mutation has a higher chance of becoming fixed. These two processes are expected to balance out, so that there is no difference in substitution rate between populations.
An essential point is that the fixation rate of neutral mutations is equal to the mutation rate. For diploids, the fixation probability of a new mutation is \(1/(2N)\) (as it is present in that proportion when it appears) and the number of new mutations per generation is \(2N\mu\), so that the fixation rate (fixation probability \(\times\) mutation rate per generation) is:
\(1/(2N) \times 2N\mu = \mu\)
For haplodiploids, the same is true:
\(1/(1.5N) \times 1.5N\mu = \mu\)
In our simulations, we have a mutation rate of \(1\times10^{-7}\) and a genome with \(100,000\) loci, giving us an expectation of 0.01 mutations per generation. If we run our simulations for a total of 5^{4} generations, we would expect a total of 500 fixed mutations.
Let us start by using a burnin period of 10,000 generations.
We measure the fixation rate by counting how many mutations have become fixed at the end of the simulation.
As expected, the average number of fixed alleles per generation is approximately 0.01 for both ploidy types.
## selection haplodiploid diploid haplodiploid_rate diploid_rate rate_difference
## 1 s = 0% 397.035 389.82 0.009925875 0.0097455 0.000180375
## difference_per
## 1 2%
However, the diploid simulations seem to have slightly fewer fixed mutations than the diploid simulations.
## dominance selection statistic p.value
## 1 h = 0% s = 0% 16405 0.001873824
## method alternative
## 1 Wilcoxon rank sum test with continuity correction two.sided
In simulations, it can take many generations for drift and mutation to balance. This is likely to explain the slightly smaller number of fixed mutations in the diploid population. This is because difference between haplodiploids and diploids is already quite large after 10000 generations, and it does not seem to increase much more in after another 40,000 generations.
We thus tested whether using larger burn-ins decreased the difference between haplodiploid and diploid simulations:
## burnin selection haplodiploid diploid haplodiploid_rate diploid_rate
## 1 10000 s = 0% 397.035 389.820 0.009925875 0.009745500
## 2 11000 s = 0% 387.705 382.235 0.009941154 0.009800897
## 3 12500 s = 0% 373.815 369.070 0.009968400 0.009841867
## 4 15000 s = 0% 350.080 346.240 0.010002286 0.009892571
## 5 20000 s = 0% 300.305 297.515 0.010010167 0.009917167
## rate_difference difference_per statistic p.value
## 1 0.0001803750 2% 16405.0 0.001873824
## 2 0.0001402564 1% 17309.5 0.019962628
## 3 0.0001265333 1% 17731.0 0.049713044
## 4 0.0001097143 1% 18245.5 0.129167100
## 5 0.0000930000 1% 18680.5 0.253833745
## method alternative
## 1 Wilcoxon rank sum test with continuity correction two.sided
## 2 Wilcoxon rank sum test with continuity correction two.sided
## 3 Wilcoxon rank sum test with continuity correction two.sided
## 4 Wilcoxon rank sum test with continuity correction two.sided
## 5 Wilcoxon rank sum test with continuity correction two.sided
From these results, we conclude that using a longer burn-in period decreases the difference between haplodiploids and diploids at the last generation. We therefore take a burn-in of 15,000 hereafter.
For the neutral simulation, we expect the haplodiploid and diploid simulations to have the same fixation rate. Below, each line is a single simulation. The rate of fixation is the slope of those lines, or, in other words, the average number of fixed alleles per generation.
To measure the fixation rate, we can look at the average number of fixed alleles at the end of the simulations.
As expected, the average number of fixed alleles per generation is approximately 0.01 for both ploidy types.
## selection haplodiploid diploid haplodiploid_rate diploid_rate rate_difference
## 1 s = 0% 350.08 346.24 0.01000229 0.009892571 0.0001097143
## difference_per
## 1 1%
We then test if the values are normally distributed:
##
## Shapiro-Wilk normality test
##
## data: final_neutral_for_wilcox_test$diploid
## W = 0.99135, p-value = 0.2787
##
## Shapiro-Wilk normality test
##
## data: final_neutral_for_wilcox_test$haplodiploid
## W = 0.99299, p-value = 0.4592
As expected, there is no difference between fixation rate of neutral mutations in simulations of haplodiploid and diploid populations.
## selection haplodiploid diploid haplodiploid_rate diploid_rate rate_difference
## 1 s = 0% 350.08 346.24 0.01000229 0.009892571 0.0001097143
## difference_per statistic p.value
## 1 1% 18245.5 0.1291671
## method alternative
## 1 Wilcoxon rank sum test with continuity correction two.sided
For the simulation where all mutations were advantageous, we expect the haplodiploid populations to fix alleles quicker, where the males (which are haploid) are under selection as soon as the mutation appears, whereas the mutation needs to be at high enough frequency to be in homozygous individuals before it is under selection. This is obvious in the simulations with the most highly advantageous mutations (S = +10%).
Again, to measure the fixation rate, we can look at the average number of fixed alleles at the end of the simulations.
The fixation rate is higher than in the neutral simulations for all classes. As expected, haplodiploids have higher fixation rates than diploids.
## selection haplodiploid diploid haplodiploid_rate diploid_rate
## 1 s = 0.1% 1024.31 774.155 0.02926600 0.02211871
## 2 s = 0.3% 2131.66 1270.655 0.06090457 0.03630443
## 3 s = 1% 4353.90 2089.120 0.12439714 0.05968914
## rate_difference difference_per
## 1 0.007147286 32%
## 2 0.024600143 68%
## 3 0.064708000 108%
The fixation rates are not normally distributed:
## # A tibble: 6 × 5
## # Groups: ploidy, selection [6]
## ploidy selection statistic p.value method
## <fct> <fct> <dbl> <dbl> <chr>
## 1 diploid s = 0.1% 0.991 0.227 Shapiro-Wilk normality test
## 2 diploid s = 0.3% 0.995 0.802 Shapiro-Wilk normality test
## 3 diploid s = 1% 0.992 0.390 Shapiro-Wilk normality test
## 4 haplodiploid s = 0.1% 0.982 0.0131 Shapiro-Wilk normality test
## 5 haplodiploid s = 0.3% 0.990 0.162 Shapiro-Wilk normality test
## 6 haplodiploid s = 1% 0.992 0.387 Shapiro-Wilk normality test
All comparisons are statistically significant:
## selection statistic p.value
## 1 s = 0.1% 0 4.787268e-67
## 2 s = 0.3% 0 4.798804e-67
## 3 s = 1% 0 4.801980e-67
## method alternative
## 1 Wilcoxon rank sum test with continuity correction two.sided
## 2 Wilcoxon rank sum test with continuity correction two.sided
## 3 Wilcoxon rank sum test with continuity correction two.sided
For the deleterious mutations, we expect that the fixation rate will be lower than that of neutral or advantageous mutations. Interestingly, none of of the highly deleterious mutations (S=-10%) was fixed in the population.
We can look at the fixation rate of the other two mutation types.
The difference between haplodiploid and diploid observed for midly deleterious mutations is very large.
## selection haplodiploid diploid haplodiploid_rate diploid_rate
## 1 s = -0.03% 197.100 217.630 5.631429e-03 6.218000e-03
## 2 s = -0.05% 127.895 147.375 3.654143e-03 4.210714e-03
## 3 s = -0.1% 37.205 45.590 1.063000e-03 1.302571e-03
## 4 s = -0.3% 0.040 0.060 1.142857e-06 1.714286e-06
## 5 s = -1% 0.000 0.000 0.000000e+00 0.000000e+00
## rate_difference difference_per
## 1 -5.865714e-04 -9%
## 2 -5.565714e-04 -13%
## 3 -2.395714e-04 -18%
## 4 -5.714286e-07 -33%
## 5 0.000000e+00 NaN%
The number of fixed mutations is normally distributed for some of the simulations, but this is not a sufficient justification to use a t-test instead of a Wilcoxon rank sum for these simulations only:
## # A tibble: 8 × 5
## # Groups: ploidy, selection [8]
## ploidy selection statistic p.value method
## <fct> <fct> <dbl> <dbl> <chr>
## 1 diploid s = -0.03% 0.983 0.0189 Shapiro-Wilk normality test
## 2 diploid s = -0.05% 0.987 0.0584 Shapiro-Wilk normality test
## 3 diploid s = -0.1% 0.996 0.907 Shapiro-Wilk normality test
## 4 diploid s = -0.3% 0.252 0 Shapiro-Wilk normality test
## 5 haplodiploid s = -0.03% 0.996 0.835 Shapiro-Wilk normality test
## 6 haplodiploid s = -0.05% 0.986 0.0431 Shapiro-Wilk normality test
## 7 haplodiploid s = -0.1% 0.989 0.129 Shapiro-Wilk normality test
## 8 haplodiploid s = -0.3% 0.176 0 Shapiro-Wilk normality test
However, the difference observed for the weakly deleterious mutations is not significant.
## selection haplodiploid diploid haplodiploid_rate diploid_rate
## 1 s = -0.03% 197.100 217.630 5.631429e-03 6.218000e-03
## 2 s = -0.05% 127.895 147.375 3.654143e-03 4.210714e-03
## 3 s = -0.1% 37.205 45.590 1.063000e-03 1.302571e-03
## 4 s = -0.3% 0.040 0.060 1.142857e-06 1.714286e-06
## 5 s = -1% 0.000 0.000 0.000000e+00 0.000000e+00
## rate_difference difference_per statistic p.value
## 1 -5.865714e-04 -9% 33275 1.584650e-30
## 2 -5.565714e-04 -13% 35436 1.119552e-40
## 3 -2.395714e-04 -18% 32678 5.066054e-28
## 4 -5.714286e-07 -33% 20494 2.466463e-01
## 5 0.000000e+00 NaN% 20000 NaN
## method alternative
## 1 Wilcoxon rank sum test with continuity correction two.sided
## 2 Wilcoxon rank sum test with continuity correction two.sided
## 3 Wilcoxon rank sum test with continuity correction two.sided
## 4 Wilcoxon rank sum test with continuity correction two.sided
## 5 Wilcoxon rank sum test with continuity correction two.sided
The simulation with more deleterious mutations have very few mutations becoming fixed.
## # A tibble: 10 × 3
## # Groups: selection [5]
## selection ploidy simulations_with_fixed_mutations
## <fct> <fct> <int>
## 1 s = -0.03% haplodiploid 200
## 2 s = -0.03% diploid 200
## 3 s = -0.05% haplodiploid 200
## 4 s = -0.05% diploid 200
## 5 s = -0.1% haplodiploid 200
## 6 s = -0.1% diploid 200
## 7 s = -0.3% haplodiploid 7
## 8 s = -0.3% diploid 12
## 9 s = -1% haplodiploid 0
## 10 s = -1% diploid 0
For the simulation where s = -0.3%, these simulation
runs had one or more mutations becoming fixed.
## # A tibble: 19 × 4
## selection ploidy simulation_id number_fixed
## <fct> <fct> <dbl> <dbl>
## 1 s = -0.3% diploid 1 1
## 2 s = -0.3% diploid 100 1
## 3 s = -0.3% diploid 112 1
## 4 s = -0.3% diploid 122 1
## 5 s = -0.3% diploid 171 1
## 6 s = -0.3% diploid 172 1
## 7 s = -0.3% diploid 194 1
## 8 s = -0.3% diploid 196 1
## 9 s = -0.3% diploid 25 1
## 10 s = -0.3% diploid 34 1
## 11 s = -0.3% diploid 59 1
## 12 s = -0.3% diploid 66 1
## 13 s = -0.3% haplodiploid 10 1
## 14 s = -0.3% haplodiploid 114 1
## 15 s = -0.3% haplodiploid 117 1
## 16 s = -0.3% haplodiploid 120 1
## 17 s = -0.3% haplodiploid 181 1
## 18 s = -0.3% haplodiploid 190 1
## 19 s = -0.3% haplodiploid 80 2
For this, we ran simulation with the following parameters: * Larger genome of 1e6 loci * Smaller mutation rate of 1e-8 * Larger population of 2000
## selection dominance haplodiploid_rate diploid_rate difference_per
## 1 s = 0.1% h = 0% 0.02924986 0.02212743 32%
## 2 s = 0.1% h = 25% 0.03339200 0.02967186 13%
## 3 s = 0.1% h = 50% 0.03802357 0.03874143 -2%
## 4 s = 0.1% h = 75% 0.04283343 0.04920943 -13%
## 5 s = 0.1% h = 100% 0.04813114 0.06043971 -20%
## dominance selection statistic p.value
## 1 h = 0% s = 0.1% 0.0 4.789695e-67
## 2 h = 100% s = 0.1% 40000.0 4.807457e-67
## 3 h = 25% s = 0.1% 77.5 1.531093e-66
## 4 h = 50% s = 0.1% 27872.0 9.840289e-12
## 5 h = 75% s = 0.1% 40000.0 4.798331e-67
## method alternative
## 1 Wilcoxon rank sum test with continuity correction two.sided
## 2 Wilcoxon rank sum test with continuity correction two.sided
## 3 Wilcoxon rank sum test with continuity correction two.sided
## 4 Wilcoxon rank sum test with continuity correction two.sided
## 5 Wilcoxon rank sum test with continuity correction two.sided
## # A tibble: 10 × 6
## # Groups: ploidy, dominance, selection [10]
## ploidy dominance selection statistic p.value method
## <fct> <fct> <fct> <dbl> <dbl> <chr>
## 1 diploid h = 0% s = 0.1% 0.995 0.763 Shapiro-Wilk normality te…
## 2 diploid h = 100% s = 0.1% 0.995 0.727 Shapiro-Wilk normality te…
## 3 diploid h = 25% s = 0.1% 0.991 0.225 Shapiro-Wilk normality te…
## 4 diploid h = 50% s = 0.1% 0.989 0.137 Shapiro-Wilk normality te…
## 5 diploid h = 75% s = 0.1% 0.992 0.368 Shapiro-Wilk normality te…
## 6 haplodiploid h = 0% s = 0.1% 0.992 0.357 Shapiro-Wilk normality te…
## 7 haplodiploid h = 100% s = 0.1% 0.995 0.805 Shapiro-Wilk normality te…
## 8 haplodiploid h = 25% s = 0.1% 0.992 0.310 Shapiro-Wilk normality te…
## 9 haplodiploid h = 50% s = 0.1% 0.988 0.0896 Shapiro-Wilk normality te…
## 10 haplodiploid h = 75% s = 0.1% 0.994 0.597 Shapiro-Wilk normality te…
Haplodiploid populations have 1.5N chromosomes in the population, whereas diploid populations have 2N chromosomes. We re-ran the haplodiploid simulations with a larger population size (K = 2667).
## dominance haplodiploid diploid haplodiploid_rate diploid_rate
## 1 h = 0% 1238.060 774.460 0.03537314 0.02212743
## 2 h = 100% 2188.975 2115.390 0.06254214 0.06043971
## 3 h = 25% 1449.430 1038.515 0.04141229 0.02967186
## 4 h = 50% 1680.310 1355.950 0.04800886 0.03874143
## 5 h = 75% 1925.360 1722.330 0.05501029 0.04920943
## rate_difference difference_per selection statistic p.value
## 1 0.013245714 60% s = 0.1% 0.0 4.791717e-67
## 2 0.002102429 3% s = 0.1% 4869.5 3.908280e-39
## 3 0.011740429 40% s = 0.1% 0.0 4.788953e-67
## 4 0.009267429 24% s = 0.1% 0.0 4.797994e-67
## 5 0.005800857 12% s = 0.1% 3.0 5.017313e-67
## method alternative
## 1 Wilcoxon rank sum test with continuity correction two.sided
## 2 Wilcoxon rank sum test with continuity correction two.sided
## 3 Wilcoxon rank sum test with continuity correction two.sided
## 4 Wilcoxon rank sum test with continuity correction two.sided
## 5 Wilcoxon rank sum test with continuity correction two.sided
We remake the deleterious figure, as we do not want to include simulations with strongly deleterious mutations, in which almost no simulation had any fixed mutation.
## quartz_off_screen
## 2
## quartz_off_screen
## 2
## quartz_off_screen
## 2
We want to show that the non-Wright-Fisher simulations of diploid populations are equivalent Wright-Fisher simulations. We re-ran the simulations of adaptive alleles using a simple Wright-Fisher model.
## selection statistic p.value
## 1 s = 0.1% 21727.5 0.1352004
## 2 s = 0.3% 19291.5 0.5402583
## 3 s = 1% 20149.0 0.8977918
## 4 s = 0% 20073.0 0.9499901
## method alternative
## 1 Wilcoxon rank sum test with continuity correction two.sided
## 2 Wilcoxon rank sum test with continuity correction two.sided
## 3 Wilcoxon rank sum test with continuity correction two.sided
## 4 Wilcoxon rank sum test with continuity correction two.sided
## quartz_off_screen
## 2